deep gcn
Towards a deeper GCN: Alleviate over-smoothing with iterative training and fine-tuning
Peng, Furong, Gao, Jinzhen, Lu, Xuan, Liu, Kang, Huo, Yifan, Wang, Sheng
Graph Convolutional Networks (GCNs) suffer from severe performance degradation in deep architectures due to over-smoothing. While existing studies primarily attribute the over-smoothing to repeated applications of graph Laplacian operators, our empirical analysis reveals a critical yet overlooked factor: trainable linear transformations in GCNs significantly exacerbate feature collapse, even at moderate depths (e.g., 8 layers). In contrast, Simplified Graph Convolution (SGC), which removes these transformations, maintains stable feature diversity up to 32 layers, highlighting linear transformations' dual role in facilitating expressive power and inducing over-smoothing. However, completely removing linear transformations weakens the model's expressive capacity. To address this trade-off, we propose Layer-wise Gradual Training (LGT), a novel training strategy that progressively builds deep GCNs while preserving their expressiveness. LGT integrates three complementary components: (1) layer-wise training to stabilize optimization from shallow to deep layers, (2) low-rank adaptation to fine-tune shallow layers and accelerate training, and (3) identity initialization to ensure smooth integration of new layers and accelerate convergence. Extensive experiments on benchmark datasets demonstrate that LGT achieves state-of-the-art performance on vanilla GCN, significantly improving accuracy even in 32-layer settings. Moreover, as a training method, LGT can be seamlessly combined with existing methods such as PairNorm and ContraNorm, further enhancing their performance in deeper networks. LGT offers a general, architecture-agnostic training framework for scalable deep GCNs. The code is available at [https://github.com/jfklasdfj/LGT_GCN].
Deeper Insights into Deep Graph Convolutional Networks: Stability and Generalization
Yang, Guangrui, Li, Ming, Feng, Han, Zhuang, Xiaosheng
Several pioneering works [3, 4] introduced the initial concept of graph neural networks (GNNs), incorporating recurrent mechanisms and necessitating neural network parameters to define contraction mappings. Concurrently, Micheli [5] introduced the neural network for graphs, commonly referred to as NN4G, over a comparable timeframe. It is worth noting that the NN4G diverges from recurrent mechanisms and instead employs a feed-forward architecture, exhibiting similarities to contemporary GNNs. In recent years, (contemporary) GNNs have gained significant attention as an effective methodology for modeling graph data [6-11]. To obtain a comprehensive understanding of GNNs and deep learning for graphs, we refer the readers to relevant survey papers for an extensive overview [12-15]. Among the various GNN variants, one of the most powerful and frequently used GNNs is graph convolutional networks (GCNs). A widely accepted perspective posits that GCNs can be regarded as an extension or generalization of traditional spatial filters, which are commonly employed in Euclidean data analysis, to the realm of non-Euclidean data. Due to its success on non-Euclidean data, GCN has attracted widespread attention on its theoretical exploration.
Asymmetric Co-Training with Explainable Cell Graph Ensembling for Histopathological Image Classification
Yang, Ziqi, Li, Zhongyu, Liu, Chen, Luo, Xiangde, Wang, Xingguang, Xu, Dou, Li, Chaoqun, Qin, Xiaoying, Yang, Meng, Jin, Long
Convolutional neural networks excel in histopathological image classification, yet their pixel-level focus hampers explainability. Conversely, emerging graph convolutional networks spotlight cell-level features and medical implications. However, limited by their shallowness and suboptimal use of high-dimensional pixel data, GCNs underperform in multi-class histopathological image classification. To make full use of pixel-level and cell-level features dynamically, we propose an asymmetric co-training framework combining a deep graph convolutional network and a convolutional neural network for multi-class histopathological image classification. To improve the explainability of the entire framework by embedding morphological and topological distribution of cells, we build a 14-layer deep graph convolutional network to handle cell graph data. For the further utilization and dynamic interactions between pixel-level and cell-level information, we also design a co-training strategy to integrate the two asymmetric branches. Notably, we collect a private clinically acquired dataset termed LUAD7C, including seven subtypes of lung adenocarcinoma, which is rare and more challenging. We evaluated our approach on the private LUAD7C and public colorectal cancer datasets, showcasing its superior performance, explainability, and generalizability in multi-class histopathological image classification.
Old can be Gold: Better Gradient Flow can Make Vanilla-GCNs Great Again
Jaiswal, Ajay, Wang, Peihao, Chen, Tianlong, Rousseau, Justin F., Ding, Ying, Wang, Zhangyang
Despite the enormous success of Graph Convolutional Networks (GCNs) in modeling graph-structured data, most of the current GCNs are shallow due to the notoriously challenging problems of over-smoothening and information squashing along with conventional difficulty caused by vanishing gradients and over-fitting. Previous works have been primarily focused on the study of over-smoothening and over-squashing phenomena in training deep GCNs. Surprisingly, in comparison with CNNs/RNNs, very limited attention has been given to understanding how healthy gradient flow can benefit the trainability of deep GCNs. In this paper, firstly, we provide a new perspective of gradient flow to understand the substandard performance of deep GCNs and hypothesize that by facilitating healthy gradient flow, we can significantly improve their trainability, as well as achieve state-of-the-art (SOTA) level performance from vanilla-GCNs. Next, we argue that blindly adopting the Glorot initialization for GCNs is not optimal, and derive a topology-aware isometric initialization scheme for vanilla-GCNs based on the principles of isometry. Additionally, contrary to ad-hoc addition of skip-connections, we propose to use gradient-guided dynamic rewiring of vanilla-GCNs} with skip connections. Our dynamic rewiring method uses the gradient flow within each layer during training to introduce on-demand skip-connections adaptively. We provide extensive empirical evidence across multiple datasets that our methods improve gradient flow in deep vanilla-GCNs and significantly boost their performance to comfortably compete and outperform many fancy state-of-the-art methods. Codes are available at: https://github.com/VITA-Group/GradientGCN.
SkipNode: On Alleviating Performance Degradation for Deep Graph Convolutional Networks
Lu, Weigang, Zhan, Yibing, Lin, Binbin, Guan, Ziyu, Liu, Liu, Yu, Baosheng, Zhao, Wei, Yang, Yaming, Tao, Dacheng
Graph Convolutional Networks (GCNs) suffer from performance degradation when models go deeper. However, earlier works only attributed the performance degradation to over-smoothing. In this paper, we conduct theoretical and experimental analysis to explore the fundamental causes of performance degradation in deep GCNs: over-smoothing and gradient vanishing have a mutually reinforcing effect that causes the performance to deteriorate more quickly in deep GCNs. On the other hand, existing anti-over-smoothing methods all perform full convolutions up to the model depth. They could not well resist the exponential convergence of over-smoothing due to model depth increasing. In this work, we propose a simple yet effective plug-and-play module, SkipNode, to overcome the performance degradation of deep GCNs. It samples graph nodes in each convolutional layer to skip the convolution operation. In this way, both over-smoothing and gradient vanishing can be effectively suppressed since (1) not all nodes perform full convolutions up to the model depth and, (2) the gradient can be directly passed back through ``skipped'' nodes. We provide both theoretical analysis and empirical evaluation to demonstrate the efficacy of SkipNode and its superiority over SOTA baselines.
Revisiting "Over-smoothing" in Deep GCNs
Yang, Chaoqi, Wang, Ruijie, Yao, Shuochao, Liu, Shengzhong, Abdelzaher, Tarek
Oversmoothing has been assumed to be the major cause of performance drop in deep graph convolutional networks (GCNs). The evidence is usually derived from Simple Graph Convolution (SGC), a linear variant of GCNs. In this paper, we revisit graph node classification from an optimization perspective and argue that GCNs can actually learn anti-oversmoothing, whereas overfitting is the real obstacle in deep GCNs. This work interprets GCNs and SGCs as two-step optimization problems and provides the reason why deep SGC suffers from oversmoothing but deep GCNs does not. Our conclusion is compatible with the previous understanding of SGC, but we clarify why the same reasoning does not apply to GCNs. Based on our formulation, we provide more insights into the convolution operator and further propose a mean-subtraction trick to accelerate the training of deep GCNs. We verify our theory and propositions on three graph benchmarks. The experiments show that (i) in GCN, overfitting leads to the performance drop and oversmoothing does not exist even model goes to very deep (100 layers); (ii) mean-subtraction speeds up the model convergence as well as retains the same expressive power; (iii) the weight of neighbor averaging (1 is the common setting) does not significantly affect the model performance once it is above the threshold ( 0.5).
The Truly Deep Graph Convolutional Networks for Node Classification
Rong, Yu, Huang, Wenbing, Xu, Tingyang, Huang, Junzhou
Existing Graph Convolutional Networks (GCNs) are shallow---the number of the layers is usually not larger than 2. The deeper variants by simply stacking more layers, unfortunately perform worse, even involving well-known tricks like weight penalizing, dropout, and residual connections. This paper reveals that developing deep GCNs mainly encounters two obstacles: \emph{over-fitting} and \emph{over-smoothing}. The over-fitting issue weakens the generalization ability on small graphs, while over-smoothing impedes model training by isolating output representations from the input features with the increase in network depth. Hence, we propose DropEdge, a novel technique to alleviate both issues. At its core, DropEdge randomly removes a certain number of edges from the input graphs, acting like a data augmenter and also a message passing reducer. More importantly, DropEdge enables us to recast a wider range of Convolutional Neural Networks (CNNs) from the image field to the graph domain; in particular, we study DenseNet and InceptionNet in this paper. Extensive experiments on several benchmarks demonstrate that our method allows deep GCNs to achieve promising performance, even when the number of layers exceeds 30---the deepest GCN that has ever been proposed.